High through-put cloning of protooncogenes

ABSTRACT

The present invention provides a process of identifying protooncogenes using high-throughput provirus tagging (HPT), e.g., by recovering host/virus junction sequences from chimeric transcripts containing both host and virus sequences.

GENERAL BACKGROUND

[0001] Cancer is the phenotypic manifestation of a complex biological progression during which cells accumulate multiple somatic mutations, eventually acquiring sufficient growth autonomy to metastasize. Although inherited cancer susceptibility alleles and epigenetic factors influence the process, carcinogenesis is fundamentally driven by somatic cell evolution (i.e., mutation and natural selection of variants with progressive loss of growth control). The genes which are the targets of these somatic mutations are classified as either protooncogenes or tumor suppressor genes, depending on whether their mutant phenotyes are dominant or recessive, respectively.

[0002] In several animal models, an important source of protooncogene somatic mutations is retrovirus infection. Retroviruses cause can cause cancer by essentially three mechanisms: (i) transduction of host protooncogenes (which then become viral oncogenes), (ii) trans-acting effects of viral gene products, or (iii) cis-acting effects of provirus integration on protooncogenes at or very near the site of integration. In the later case, only rare infected cells are affected. This phenomenon is called provirus insertion mutation, and will be discussed in detail in the following narrative.

[0003] As a normal consequence of the retroviral life-cycle, DNA copies of the retrovirus genome (called a proviruses) are integrated into the host genome. Accordingly, retroviruses are obligate mutagens. A newly-integrated provirus can affect gene expression in cis at or near the integration site by one of two mechanisms. Type I insertion mutations up-regulate transcription of proximal genes as a consequence of regulatory sequences (enhancers and/or promoters) within the proviral long terminal repeats (LTRs). These insertion mutations typically affect genes that are not expressed in the target tissue. Type II insertion mutations cause truncation of coding regions due to either integration directly within an open reading frame or integration within an intron upstream of the stop codon.

[0004] Provirus integration is random. Therefor, all host genes are targets of insertion mutation. In a chronically-infected tissue, a sufficient number of cells have new provirus insertions that, statistically, all genes in the genome are mutated. In rare cases, an insertion mutation will “activate” a host protooncogene, providing the affected cell with a dominant selective growth advantage in vivo. If the cell progresses to cancer, then the protooncogene insertion mutation will be present at clonal stoichiometry in the tumor. Such “clonally-integrated” proviruses serve to “tag” the locations of protooncogenes in the genome. In cases where the proviral enhancer is responsible for dysregulation of the mutated protooncogene, the provirus can be 100 kb or more from the site of integration (but usually much closer).

[0005] This relatively tight linkage between clonally-integrated proviruses and protooncogenes is the basis for a classical experimental strategy, called “provirus tagging,” in which slow-transforming retroviruses that act by an insertion mutation mechanism are used to isolate protooncogenes. The complete logic is as follows: (i) uninfected animals have low cancer rates, (ii) infected animals have high cancer rates, (iii) the retroviruses involved do not carry transduced host protooncogenes or pathogenic trans-acting viral genes, (iv) the cancer incidence must therefor be a direct consequence of provirus integration effects on host protooncogenes, (v) since provirus integration is random, rare integrants will “activate” host protooncogenes that provide a selective growth advantage, and (vi) these rare events result in new proviruses at clonal stoichiometries in tumors.

[0006] In contrast to mutations caused by chemicals, radiation, or spontaneous errors, protooncogene insertion mutations can be easily located by virtue of the fact that a convenient-sized genetic marker of known sequence is present at the site of mutation (i.e., the provirus). Host sequences that flank clonally-integrated proviruses can be recovered using a variety of molecular techniques. Once these sequences are in hand, the tagged protooncogenes can be subsequently identified.

[0007] There are two unequivocal biological criteria that provide prima facie evidence that a protooncogene is present at or very near a proviral integration site. The first criterion is the presence of provirus at the same locus in two or more independent tumors. This is because the genome is too large for random integrations to result in observable clustering. Any clustering that is detected is indirect evidence for biological selection (i.e., the tumor phenotype resulting from activation of a host protooncogene). The second criterion is a tumor with only a single insertion mutation. In this case, if there is only one insertion mutation, then that provirus is located at a protooncogene locus. If either of these criteria are met, sufficient evidence exists to reach a conclusion that a protooncogene locus has been located.

[0008] The provirus tagging concept has withstood two decades of testing in many retrovirus tumor models that have a provirus insertion mutation etiology. The biological logic is so compelling, and the experimental results so unequivocal, that the claim can be made that the activated genes are functionally-validated as protooncogenes at the time-of discovery. Formal confirmation typically involves isolation of a full-length cDNA for use in a bioassay (either a cell-based transformation assay or transgenic mice).

[0009] Provirus tagging in avian and mammalian systems has led to the identification of approximately 50-60 protooncogenes (many of which were new genes not previously identified by other techniques). The three mammalian retroviruses that cause cancer by an insertion mutation mechanism are FeLV (leukemia/lymphoma in cats), MLV (leukemia/lymphoma in mice and rats), and MMTV (mammary cancer in mice).

[0010] Despite the tremendous promise of the provirus tagging approach, as originally designed it was not well-suited for large scale application. The main problem was that it was too laborious and, therefor, the risks of reisolating known genes became unacceptable for most investigators. As a consequence, the protooncogene discovery potential of this approach has remained largely untapped.

[0011] Recognizing this untapped potential, we designed and implemented HPT to overcome the limitations of the original provirus tagging approach (which were all fundamentally related to throughput). We were able to successfully increase provirus tagging throughput to the point where reisolation of known loci is no longer a problem. In fact, this is now a desirable outcome because it serves as an “internal control” that helps validate the biological relevance of the new genes that are recovered in parallel.

[0012] As a functional oncogenomics strategy, HPT has many advantages. First, it is a functional cloning rather than brute-force (e.g., differential display-based)approach; and the genes that are recovered are functionally-validated at the time of discovery. Second, it has high biological relevance since protooncogenes are isolated directly from clinical material (rather than from cell lines, transplants, or materials generated by gene transfer). Third, it is amenable to automation, meaning that throughput and time-to-discovery is a simple function of research resources.

[0013] The invention is a process called high-throughput provirus tagging (HPT). HPT yields partial protooncogene cDNAs from retrovirus-induced tumors. Using these partial cDNAs, conventional techniques can be used to recover full-length cDNAs (we have not yet performed this final step). A conceptual diagram is shown in Appendix A and a flow chart of the process is shown in Appendix B.

[0014] HPT is derived from classical procedures for provirus tagging (see Appendix C for background information). It is specific for tumors induced by retroviruses that cause cancer via a provirus insertion mutation mechanism. This subset of retroviruses includes the mouse mammary tumor virus (MMTV). MMTV-induced tumors were used to implement the HPT process.

[0015] In tumors induced by provirus insertion mutation, new proviral integrants present at clonal stoichiometries tag the locations of host protooncogenes. The majority of such integrants fall outside of transcribed regions. However, a subset fall within sequences that are transcribed, and result in the formation of chimeric transcripts containing both host and virus sequences. HPT is designed to recover host/virus junction sequences from these chimeric transcripts.

[0016] The strategy used is a modified/optimized anchored-PCR (A-PCR)approach incorporating a custom anchor. The procedure amplifies host sequences upstream of 5′ LTRs. If a transcript containing a host/virus junction is present in a tumor, then a unique fragment is generated by the A-PCR procedure, which can be detected by gel electrophoresis. In addition, one or more common fragments will be generated from retroviral transcripts that contain the 5′ end of the 3′ LTR.

[0017] The innovation that makes this approach feasible is that cDNAs are digested with a restriction enzyme with a 4 bp recognition sequence prior to amplification. This generates populations of target cDNAs that (1) have precise 5′ ends, and (2) are sufficiently small to ensure that they will efficiently amplify. In addition, restriction enzymes are selected that produce the largest possible retroviral transcription products (so that they run at the top of the gel). This is critical because chimeric transcripts are present at much lower levels than the major retroviral transcripts. By selection of appropriate restriction enzymes, a large detection window is available in a region of the gel where the signal-to-noise ratio is most favorable. In addition, during amplification, cycling times are ramped to favor smaller products.

[0018] The provirus tagging strategy has been used for almost 20 years. It is a DNA-based detection method where identification of new genes requires positional cloning procedures to find genes adjacent to integration sites recovered from tumor DNA. This laborious process has been recently improved by PCR procedures. Nevertheless, unless the integration falls within known sequence, it is not possible to identify the affected gene without a large amount of additional work.

[0019] The advantage of HPT is that it is the first PCR-based provirus tagging approach that recovers protooncogenes from RNA. Because RNA is used, new protooncogenes are identified directly. Although only a fraction of tumors have insertion mutations that generate a chimeric transcript, the process has been designed to be high-throughput. As a consequence, the fact that most samples are non-informative is not a problem. In addition, the process is so efficient that recovery of know protooncogenes does not represent an unacceptable loss of effort, and, in fact, serves as an internal control to verify the robustness of the strategy.

1 18 1 101 DNA Mouse 1 catggcgaga ttctgtgtcc aagctgcctc tactcgtgac attccaagat gcctctgagg 60 tgggaactgt gaaataggac agagccccac agtcccctct t 101 2 75 DNA Mouse 2 catggcaaga tggagacttt gtctaccagg gccactccaa gcacccagct gcatacaggt 60 ggactggctg tggcc 75 3 46 DNA Mouse 3 catgctggct gttcctgcag cccagctact gggacaatct ggaaac 46 4 18 DNA Mouse 4 catgtgctca atccatag 18 5 79 DNA Mouse 5 catgggtccc tgaagggtct ctcctttagc aaacccctgt acagttgaag tgatttttca 60 ggtacccatt ggtcttagc 79 6 51 DNA Mouse 6 catggcaaga tggagacttt gtctaccagg gccactccaa gcacccagct g 51 7 260 DNA Mouse 7 catgcacaca aactggccct gaacttttga cttccaggcc tctgcctctc tgcgcgcaca 60 cacacactcg cactcctgta tatgaagcgt atatgtgttt ctctgggaac tgtttttatc 120 aggtgaagta cttcctttgt tcttgctacc cacctccagg gctccaggat ctccagacag 180 ccaaccctaa gacaggccca gcttctctgt atctctgtga tgagaacctt ggcatagagc 240 tgcctcaccc tcgggatagg 260 8 45 DNA Mouse 8 catgcctctg gaaagtacct taaacataga atcccctccc tagtg 45 9 31 DNA Mouse 9 catggttttt ttttttttga gtgtgtgtgt g 31 10 46 DNA Mouse 10 catgcagatt aaagtacata tatgtaaaaa ataaaaataa atcttt 46 11 201 DNA Mouse 11 catgataagg ttagagtttt gtgagcctcc ttaaccttgc tcagcaagcg ttgggctctt 60 ggcagccgag ctgccatctt tctcatcccc gatagagcca gccgcccttg tcgtgtcttg 120 aataagttag aggaggcatt atagagcgga cctaaacatt tgccttggag cctgagggat 180 ggggattggc tgaatgtgaa t 201 12 122 DNA Mouse 12 catgaattca tcactggtaa aatgtatgaa tttcttctga gacagagtct tcttattggc 60 ttacacttgc ttcgagcgga tgattctgct gcttcagcct cttgagatgc tcagatatgt 120 gc 122 13 16 DNA Mouse 13 catggatgct attggg 16 14 22 DNA Mouse 14 catgagaggg tgcttcaggg tg 22 15 322 DNA Mouse 15 catgcacaca aactggccct gaacttttga cttccaggcc tctgcctctc tgcgcgcaca 60 cacacactcg cactcctgta tatgaagcgt atatgtgttt ctctgggaac tgtttttatc 120 aggtgaagta cttcctttgt tcttgctacc cacctccagg gctccaggat ctccagacag 180 ccaaccctaa gacaggccca gcttcctctg tatctctgtg atgagaacct tggcatagag 240 ctgccctcac cctcgggata gggcttatgt tccccggaac gagccaggca cctcaacagc 300 tcctggggag gaatagggga ct 322 16 48 DNA Mouse 16 catgaattcc acacctccat caagggtgtc ttctccagtg agccccgg 48 17 158 DNA Mouse 17 catgcctccc tcagcctcct cccacccctt cctgtcctgc ctcctcatca ctgtgtaaat 60 aatttgcacc gaaatgtggc cgcagagcca cgcgttcggt tatgtaaata aaactattta 120 ttgtgctggg ttccagcctg ggttgcagag accaccct 158 18 68 DNA Mouse 18 catgctaatg gagtttattc ttaggactgc ctcctgcatc cattgattga cttaaatatg 60 tgcacact 68 

I claim:
 1. A method of identifying protooncogenes comprising: inserting a provirus into the genome of a host forming a junction site of DNA of said virus and said host; isolating mRNA from said host; preparing cDNA from said mRNA; amplifying said cDNA to identify the nucleic acid sequence of said junction site, whereby said candidate target gene is identified. 